Hand-on Exercise 3b — Programming Animated Statistical Graphics with R

Author

Li Yuquan

Published

April 29, 2024

Modified

April 29, 2025

1 Overview

When telling a visually-driven data story, animated graphics tend to attract the interest of the audience and make deeper impression than static graphics. In this hands-on exercise, we will learn how to create animated data visualisation by using gganimate and plotly r packages. At the same time, we will also learn how to:

(i) reshape data by using tidyr package, and

(ii) process, wrangle and transform data by using dplyr package.

1.1 Basic concepts of animation

When creating animations, the plot does not actually move. Instead, many individual plots are built and then stitched together as movie frames, just like an old-school flip book or cartoon. Each frame is a different plot when conveying motion, which is built using some relevant subset of the aggregate data. The subset drives the flow of the animation when stitched back together.

1.2 Terminology

Before we dive into the steps for creating an animated statistical graph, it’s important to understand some of the key concepts and terminology related to this type of visualization.

  1. Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data.

  2. Animation Attributes: The animation attributes are the settings that control how the animation behaves. For example, you can specify the duration of each frame, the easing function used to transition between frames, and whether to start the animation from the current frame or from the beginning.

Tip

Before creating animated graphs, it is important to consider whether the effort is justified. While animation may not significantly enhance exploratory data analysis, it can be highly effective in presentations by helping the audience engage with the topic more deeply compared to static visuals

2 Getting Started

We use p_load from pacman package to check, install and load the following R packages:

Package Description
plotly R library for plotting interactive statistical graphs.
gganimate An ggplot extension for creating animated statistical graphs.
gifski Converts video frames to GIF animations using efficient cross-frame palettes and temporal dithering.
gapminder An excerpt of the data available at Gapminder.org.
tidyverse A family of modern R packages designed for data science, analysis and communication tasks.
pacman::p_load(readxl, gifski, gapminder,
               plotly, gganimate, tidyverse,ggrepel)

In this hands-on exercise, the Data worksheet from GlobalPopulation Excel workbook will be used.

Write a code chunk to import Data worksheet from GlobalPopulation Excel workbook by using appropriate R package from tidyverse family.

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_at(col, as.factor) %>%
  mutate(Year = as.integer(Year))
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate(across(all_of(col), as.factor)) %>%
  mutate(Year = as.integer(Year))
  • read_xls() of readxl package is used to import the Excel worksheet.

  • mutate_at() / across of dplyr package is used to convert all character data type into factor for/across multiple columns.

    • This line applies the factor() function to each column specified in the col argument. Character to factor. It takes column indices or column names in strings format as inputs, and returns a data frame with new columns for each column in the input data frame, where each new column is the result of applying the specified function to the corresponding column in the input data frame.

    • The fun argument specifies the function to apply to each column, and factor(.) is a way to specify the factorworks as an argument.

  • mutate of dplyr package is used to convert data values of Year field into integer.

    • as.character(x), as.integer(x), as.numeric(x), as.factor(x) (for categorical data)
head(globalPop)
# A tibble: 6 × 6
  Country      Year Young   Old Population Continent
  <fct>       <int> <dbl> <dbl>      <dbl> <fct>    
1 Afghanistan  1996  83.6   4.5     21560. Asia     
2 Afghanistan  1998  84.1   4.5     22913. Asia     
3 Afghanistan  2000  84.6   4.5     23898. Asia     
4 Afghanistan  2002  85.1   4.5     25268. Asia     
5 Afghanistan  2004  84.5   4.5     28514. Asia     
6 Afghanistan  2006  84.3   4.6     31057  Asia     
glimpse(globalPop)
Rows: 6,204
Columns: 6
$ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
$ Year       <int> 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014,…
$ Young      <dbl> 83.6, 84.1, 84.6, 85.1, 84.5, 84.3, 84.1, 83.7, 82.9, 82.1,…
$ Old        <dbl> 4.5, 4.5, 4.5, 4.5, 4.5, 4.6, 4.6, 4.6, 4.6, 4.7, 4.7, 4.7,…
$ Population <dbl> 21559.9, 22912.8, 23898.2, 25268.4, 28513.7, 31057.0, 32738…
$ Continent  <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia,…
str(globalPop)
tibble [6,204 × 6] (S3: tbl_df/tbl/data.frame)
 $ Country   : Factor w/ 222 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Year      : int [1:6204] 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 ...
 $ Young     : num [1:6204] 83.6 84.1 84.6 85.1 84.5 84.3 84.1 83.7 82.9 82.1 ...
 $ Old       : num [1:6204] 4.5 4.5 4.5 4.5 4.5 4.6 4.6 4.6 4.6 4.7 ...
 $ Population: num [1:6204] 21560 22913 23898 25268 28514 ...
 $ Continent : Factor w/ 6 levels "Africa","Asia",..: 2 2 2 2 2 2 2 2 2 2 ...
summary(globalPop)
        Country          Year          Young             Old       
 Afghanistan:  28   Min.   :1996   Min.   : 15.50   Min.   : 1.00  
 Albania    :  28   1st Qu.:2010   1st Qu.: 25.70   1st Qu.: 6.90  
 Algeria    :  28   Median :2024   Median : 34.30   Median :12.80  
 Andorra    :  28   Mean   :2023   Mean   : 41.66   Mean   :17.93  
 Angola     :  28   3rd Qu.:2038   3rd Qu.: 53.60   3rd Qu.:25.90  
 Anguilla   :  28   Max.   :2050   Max.   :109.20   Max.   :77.10  
 (Other)    :6036                                                  
   Population                Continent   
 Min.   :      3.3   Africa       :1568  
 1st Qu.:    605.9   Asia         :1454  
 Median :   5771.6   Europe       :1344  
 Mean   :  34860.9   North America: 976  
 3rd Qu.:  22711.0   Oceania      : 526  
 Max.   :1807878.6   South America: 336  
                                         

3 Animated Data Visualisation: gganimate methods

gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. It does this by providing a range of new grammar classes that can be added to the plot object in order to customise how it should change with time.

  • transition_*() defines how the data should be spread out and how it relates to itself across time.

  • view_*() defines how the positional scales should change along the animation.

  • shadow_*() defines how data from other points in time should be presented in the given point in time.

  • enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.

  • ease_aes() defines how different aesthetics should be eased during transitions.

3.1 Building a static population bubble plot

In the code chunk below, the basic ggplot2 functions are used to create a static bubble plot.

ggplot(globalPop, aes(x = Old, y = Young,
                      size = Population,
                      colour = Country)) +
  geom_point(alpha = 0.7,
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2,12)) +
  labs(title = 'Year: {frame_time}',
       x = '% Aged',
       y = '% Young')

3.2 Building the animated bubble plot

In the code chunk below,

  • transition_time() of gganimate is used to create transition through distinct states in time (i.e. Year).

  • ease_aes() is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

ggplot(globalPop, aes(x = Old, y = Young,
                      size = Population,
                      colour = Country)) +
  geom_point(alpha = 0.7,
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2,12)) +
  labs(title = 'Year: {frame_time}',
       x = '% Aged',
       y = '% Young') +
  transition_time(Year) +
  ease_aes('linear')

4 Animated Data Visualisation: plotly

In Plotly R package, both ggplotly() and plot_ly() support key frame animations through the frame argument/aesthetic. They also support an ids argument/aesthetic to ensure smooth transitions between objects with the same id (which helps facilitate object constancy).

4.1 Building an animated bubble plot: ggplotly()method

In this sub-section, we will create an animated bubble plot by using ggplotly() method.

gg <- ggplot(globalPop,
             aes(x = Old,
                 y = Young,
                 size = Population,
                 colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7,
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2,12)) +
  labs(x = '% Aged',
       y = '% Young')

ggplotly(gg)
Tip

-Appropriate ggplot2 functions are used to create a static bubble plot. The output is then saved as an R object called gg.

-ggplotly() is then used to convert the R graphic object into an animated svg object.

Notice that although show.legend = FALSE argument was used, the legend still appears on the plot. To overcome this problem, theme(legend.position='none') should be used as shown in the plot and code chunk below.

gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none')

ggplotly(gg)

4.2 Building an animated bubble plot: plot_ly()method

In this sub-section, we will create an animated bubble plot by using plot_ly() method.

bp <- globalPop %>%
  plot_ly(x = ~Old, 
          y = ~Young, 
          size = ~Population, 
          color = ~Continent,
          sizes = c(2, 100),
          frame = ~Year, 
          text = ~Country, 
          hoverinfo = "text",
          type = 'scatter',
          mode = 'markers'
          ) %>%
  layout(showlegend = FALSE)
bp

5 Reference